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Figure 1 
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Figure 2 

Alignment 



Alignment Type: 
Seqence A Range: 
Seqence B Range: 
Gap Open Penalty: 
Gap Extend Penalty: 
Scoring Matrix: 
Profile A : 



Local 
1 -> 111 
1 -> 78 
-11 
-1 

/usr/local/BLOSUM62 

. ./gtws_f iles/prof i les/ ld9 gBBOO .pro 
Sequence B: / tmp/gtw_6314 . f a 
DB Alignment : - 
GT Alignment: - 
View Alignment: Yes 
Reverse GT Alignment: No 

Score Length Num_ID No.+ve Ovrlp %ID %+ve From 
SCORES i 60 37 14 19 68 37.8 51.4 20 

Lengthl Length2 Normalised- Score 
SCORE2: 111 78 38.210598 



To 

55 



From 
30 



To 
66 



ld9gBB00 



10 | 20| 30| 40| 50 | 60 

**qf f reienlkey f nggplf SEILKNWKDESDKKIIQSQIVS-FY FKLFENLKDNQViqrs 



IPAAA44 5 mtspnel nklpwtnpget eicdlsdtef kl SVLKNLKEIQDNTEKESRILSDKYKKQIEI IKGNQAei le 
10| 20 f 30| 40 1 50| 60| 70 1 

| 70| 80 | 90| 100| 110J 

ld9gBB00 mdiikqdmf qkf Ingssekledf kkliqipvddlqiqrkainelikvrandls 



IPAAA445 lrnadgtl- 
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Figure 3 



INSP037 (IPAAA44548) Predicted sequence with translation product: 



1 TGCCTAGACA CCAAAGAACA ACTATTA 



AAC ATGACTTCAC CAAACGAACT 
m t s p n e 



71 AAATAAGCTG CCATGGACCA ATCCTGGAGA AACAGAGATA TGTGACCTTT CAGACACAGA ATTCAAAATA 
Ink l pwt npg etei cdl s d t efki 

141 TCTGTGTTGA AGAACCTCAA AGAAATTCAA GATAACACAG AGAAGGAATC CAGAATTCTA TCAGACAAAT 
svl knl keiq dnt eke sril sdk 



211 ATAAGAAACA GATTGAAATA ATTAAAGGGA ATCAAGCAGA AATTCTGGAG TTGAGAAATG C| 

ykk qie i ikg nqa eile lrn a d g 



281 HMBMBEBB CATAAGAGT CTTTTTATAG CAGAATTCAT CAAGCAGAAG AAAGAAT 
t 1 * ' 



The position of primers is denoted by the shaded boxes above. 
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Figure 4 

INSP037 (IPAAA44548) Cloned sequence with translation 

1 GCATCAACAA CATCCAGTAA AACATGACTT CACCAAACGA ACTAAATAAG CTGCCATGGA CCAATCCTGG 

m-t spn elnk lpw tnp 

71 AGAAACAGAG ATATGTGACC TTTCAGACAC AGAATTCAAA ATATCTGTGT TGAAGAACCT CAAGGAAATT 
g ete ic d lsd te fk isv lkn l ke i 

141 CAAGATAACA GAGAGAAGGA ATCCAGAATT CTATCAGACA AATATAAGAA ACAGATTGAA ATAATTAAAG 
qdn tek esrilsd ky k kqie i ik 

211 GGAATCAAGG AGAAATTCTG GAGTTGAGAA ATGCAGATGG CACACTTTAG AATG 
gn q aeil el r nad gtl 
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Figure 5 

Map of PCRII-TOPO-IPAAA44548 

Molecule: pCRII-TOPO-IPAAA44 54 8 , 4214 bps DNA Circular 

File Name: 13124. cm5 

Description: Plasmid ID 13124 

Molecule Features: 



Type 


Start 


End 




Name 


Description 


MARKER 


239 






SP6 




REGION 


337 


600 






IPAAA44 54 8 cloned 


GENE 


577 


341 


C 


44548 cds 




MARKER 


670 




C 


T7 




REGION 


854 


1268 




f 1 ori 




GENE 


1602 


2396 




KanR 




GENE 


2414 


3274 




AmpR 




REGION 


3419 


4092 




pUC ori 






Figure 6 
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Map of expression vector pEAK12d 

pEAK12 d, 8760 bps DNA Circular 
pEAK12DEST\cm5 

Mammalian cell expression vector (plasmid ID 11345) 
Molecule Features: 



Type 


Start 


End 




Name 


Description 


REGION 


2 


595 






pmb-ori 


GENE 


596 


1519 




Amp 




REGION 


1690 


2795 




EF-lalpha 




REGION 


2703 


2722 






position of pEAK12F 


REGION 


2796 


2845 






MCS 


MARKER 


2855 






attRl 




GENE 


3256 


3915 




CmR 




GENE 


4257 


4562 




ccdB 




MARKER 


4603 




C 


attR2 




REGION 


4733 


4733 






MCS 


REGION 


4734 


5162 






poly A/splice 


REGION 


4819 


4848 


C 




position of pEAK12R 


GENE 


5781 


5163 


C 


PUR 


PUROMYCIN 


REGION 


6005 


5782 


C 


tK 


tK promoter 


REGION 


6500 


6006 


C 


Ori P 




GENE 


8552 


6500 


c 


EBNA-1 




REGION 


8553 


8752 




sv4 0 





Molecule : 
File Name: 
Description : 
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Figure 7 

Map of plasmid pDONR201 

Molecule: pDONR201, 4 4 70 bps DNA Circular 

File Name: pDONR2 0 1 . cm5 , dated 17 Oct 2002 

Description: Gateway entry vector (Invitrogen) - plasmid ID# 13309 
Molecule Features : 



Type 


Start 


End 


Name 


REGION 


332 


563 


attPl 


GENE 


959 


1264 


ccdB 


REGION 


2513 


2744 


attP2 


GENE 


2868 


3677 


KanR 


REGION 


3794 


4467 


pUC ori 



ft/ 14464 
Nspl4464 
,4/71114464 
Dr dl43S6. 
BssSI4291 
Haell4220 t 



SsaHl 131 
BsrB* 179 
&7H245 
ette 1 249 
Hpa\ 304 
-'1325 

"1091325 
PspOMI 325 





Figure 8 
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Map of expression vector pEAK12d-IPAAA44548-6HIS 

Molecule: pEAK12d-IPAAA4454 8-6HIS, 7201 bps DNA Circular 

File Name: 11775. cm5 

Description: Mammalian cell Expression Construct 
Molecule . Features : 



Type 


Start 


End 




Name 


Description 


REGION 


2 


595 






pmb-ori 


GENE 


596 


1519 




Amp 




REGION 


1690 


2795 




EF-la 




REGION 


2703 


2722 






peakl2D-F primer 


MARKER 


2855 






attBl 




GENE 


2888 


3139 




IPAAA44548 


-6HIS 


MARKER 


3155 






attB2 




REGION 


3175 


3603 




1 A 


poly A/splice 


REGION 


3289 


3270 


C 




pEAK12D-R primer 


GENE 


4222 


3604 


C 




PUROMYCIN 


REGION 


4446 


4223 


c 


tK 


tK promoter 


REGION 


4941 


4447 


c 


Ori P 




GENE 


6993 


4941 


c 


EBNA-1 




REGION 


6994 


7193 




sv4 0 






Figure 9 
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Map of E.coli expression vector pDEST14 

Molecule: pDEST14, 6422 bps DNA Circular 

File Name: pDEST14.cm5, dated 17 Oct 2002 

Description: E.coli expresssion vector (Invitrogen) 

Notes: Gateway compatible, Expression under control of T7 

promoter 

Molecule Features: , , 



Type 


Start 


End 


Name 


Description 


MARKER 


21 




T7 


Promoter 


REGION 


67 


191 


attRl 




GENE 


441 


1100 


CmR 




GENE 


1442 


1747 


ccdB 




REGION 


1788 


1912 


attR2 




REGION 


1964 


1944 


C 


pDEST14 R primer 


GENE 


2638 


3498 


AmpR 




REGION 


3643 


4316 


pBR322 ori 






Ahdi 3420 
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Figure 10 

Map of plasmid pDEST14-IPAAA44548-6HIS 

Molecule: pDEST14 -IPAAA44548 -6HIS , 4899 bps DNA Circular 

File Name: 12896. cm5 

Description: plasmid ID 12896 

Molecule Features : 



Type 


Start 


End 




Name 


Description 


MARKER 


21 






T7 




REGION 


72 


67 


C 


attBl 




REGION 


94 


108 






Shine Dalgarno Sequence 


GENE 


109 


360 




IPAAA44548 


-6HIS 


REGION 


376 


389 




attB2 




REGION 


441 


421 


C 




pDEST14-R primer 


GENE 


1115 


1975 




Amp 




REGION 


2124 


2763 




ori 


pBR322 ori 
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Figure 1 1 PCRII TOPO IP AAA44548 

1 AGCGCCCAAT ACGCAAACCG CCTCTCCCGG CGCGTTGGCG GATTCATTAA TGCAGCTGGC 

61 ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 

121 TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 

181 TTGTGAGCGG ATAACAATTT CACACAGGAA ACAGCTATGA CCATGATTAC GCCAAGCTAT 

241 TTAGGTGACA CTATAGAATA CTCAAGCTAT GCATCAAGCT TGGTACCGAG CTCGGATCCA 

301 CTAGTAACGG CCGCCAGTGT GCTGGAATTC GCCGTTCATT CTAAAGTGTG CCATCTGCAT 

3 61 TTCTCAACTC CAGAATTTCT GCTTGATTCC CTTTAATTAT TTGAATCTGT TTCTTATATT 
421 TGTCTGATAG AATTCTGGAT TCCTTCTCTG TGTTATCTTG AATTTCCTTG AGGTTCTTCA 

4 81 ACACAGATAT TTTGAATTCT GTGTCTGAAA GGTCACATAT CTCTGTTTGT CCAGGATTGG 
541 TCCATGGCAG CTTATTTAGT TCGTTTGGTG AAGTCATGTT TTACTGGATG TTGTTGATGC 
601 AAGGGCGAAT TCTGCAGATA TCCATCACAC TGGCGGCCGC TCGAGCATGC ATCTAGAGGG 
661 CCCAATTCGC CCTATAGTGA GTCGTATTAC AATTCACTGG CCGTCGTTTT ACAACGTCGT 
721 GACTGGGAAA ACCCTGGCGT TACCCAACTT AATCGCCTTG CAGCACATCC CCCTTTCGCC 
781 AGCTGGCGTA ATAGCGAAGA GGCCCGCACC GATCGCC CTT CCCAACAGTT GCGCAGCCTG 
841 AATGGCGAAT . GGGACGCGCG CTGTAGGGGC GGATTAAGCG CGGCGGGTGT GGTGGTTACG 
901 CGCAGCGTGA CCGCTACACT TGCCAGCGCC CTAGCGCCCG CTCCTTTCGC TTTCTTCCCT 
961 TCCTTTCTGG CCACGTTCGC CGGCTTTCCC CGTCAAGCTC TAAATCGGGG GCTCCCTTTA 

1021 GGGTTCCGAT TTAGAGCTTT ACGGCACCTC GACCGCAAAA AACTTGATTT GGGTGATGGT 

1081 TCACGTAGTG GGCCATCGCC CTGATAGACG GTTTTTCGCC CTTTGACGTT GGAGTCCACG 

1141 TTCTTTAATA GTGGACTCTT GTTCCAAACT GG7VACAACAC TCAACCCTAT CGCGGTCTAT 

12 01 TCTTTTGATT TATAAGGGAT TTTGCCGATT TCGGCGTATT GGTTAAAAAA TGAGCTGATT 

12 61 TAACAAATTC AGGGCGCAAG GGCTGCTAAA GGAACCGGAA CACGTAGAAA GCCAGTCCGC 
1321 AGAAACGGTG CTGACCCCGG ATGAATGTCA GCTACTGGGC TATCTGGACA AGGGAAAACG 

13 81 CAAGCGCAAA GAGAAAGCAG GTAGCTTGCA GTGGGCTTAC ATGGCGATAG CTAGACTGGG 
1441 CGGTTTTATG GACAGCAAGC GAACCGGAAT TGCCAGCTGG GGCGCCCTCT GGTAAGGTTG 
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1501 GGAAGCCCTG CAAAGTAAAC TGGATGGCTT TCTTGCCGCC AAGGATCTGA TGGCGCAGGG 

1561 GATCAAGATC TGATCAAGAG ACAGGATGAG GATCGTTTCG CATGATTGAA GAAGATGGAT 

1621 TGCAGGCAGG TTCTCCGGCC GCTTGGGTGG AGAGGCTATT CGGCTATGAC TGGGCACAAC 

1681 AGACAATCGG CTGCTCTGAT GCCGCCGTGT TCCGGCTGTC AGCGCAGGGG CGCCCGGTTC 

1741 TTTTTGTCAA GACCGACCTG TCCGGTGCCC TGAATGAACT GCAGGACGAG GCAGCGCGGC 

1801 TATCGTGGCT GGCCACGACG GGCGTTCCTT GCGCAGCTGT GCTCGACGTT GTCACTGAAG 

1861 CGGGAAGGGA CTGGCTGCTA TTGGGCGAAG TGCCGGGGCA GGATCTCCTG TCATCTCGCC 

1921 TTGCTCCTGC CGAGAAAGTA TCCATCATGG CTGATGCAAT GCGGCGGGTG CATACGCTTG 

1981 ATCCGGCTAC CTGCCCATTC GACCACCAAG CGAAACATCG CATCGAGCGA GCACGTACTC 

.2041 GGATGGAAGC CGGTCTTGTG GATCAGGATG ATCTGGACGA AGAGCATCAG GGGCTCGCGC 

2101 C AGC CGAACT GTTCGCGAGG CTCAAGGCGC GCATGCCCGA CGGCGAGGAT CTCGTCGTGA 

2161 TCCATGGCGA TGCCTGCTTG CCGAATATCA TGGTGGAAAA TGGCCGCTTT TCTGGATTCA 

2221 ACGACTGTGG CCGGCTGGGT GTGGCGGACC GCTATCAGGA CATAGCGTTG GATACCCGTG 

2281 ATATTGCTGA AGAGCTTGGC GGCGAATGGG CTGACCGCTT CCTCGTGCTT TACGGTATCG 

.2341 CCGCTCCCGA TTCGCAGCGC ATCGCCTTCT ATCGCCTTCT TGACGAGTTC TTCTGAATTG 

24 01 AAAAAGGAAG AGTATGAGTA TTCAACATTT CCGTGTCGCC CTTATTCCCT TTTTTGCGGC 

2461 ATTTTGCCTT CCTGTTTTTG CTCACCCAGA AACGCTGGTG AAAGTAAAAG ATGCTGAAGA 

2521 TCAGTTGGGT GCACGAGTGG GTTACATCGA ACTGGATCTC AACAGCGGTA AGATGCTTGA 

2581 GAGTTTTCGC CCCGAAGAAC GTTTTCCAAT GATGAGCACT TTTAAAGTTC TGCTATGTG A 

2 641 TACACTATTA TCCCGTATTG ACGCCGGGCA AGAGCAACTC GGTCGGCGCA TACACTATTC 

2 701 TCAGAATGAC TTGGTTGAGT ACTCACCAGT CACAGAAAAG CATCTTACGG ATGGCATGAC 

2 761 AGTAAGAGAA TTATGCAGTG CTGCCATAAC CATGAGTGAT AACACTGCGG CCAACTTACT 

2 821 TCTGACAACG ATCGGAGGAC CGAAGGAGCT AACCGCTTTT TTGCACAACA TGGGGGATCA 

2 881 TGTAACTCGC CTTGATCGTT GGGAACCGGA GCTGAATGAA GCCATACCAA ACGACGAGAG 
,2941 TGACACCACG ATGCGTGTAG CAATGCCAAC AACGTTGCGG AAACTATTAA CTGGCGAACT 

3 001 ACTTAGTCTA GCTTCCCGGC AACAATTAAT AGACTGAATG GAGGCGGATA AAGTTGCAGG 
3 061 ACCAGTTCTG CGCTCGGCCC TTCCGGCTGG CTGGTTTATT GCTGATAAAT CTGGAGCCGG 
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3121 TGAGCGTGGG TCTCGCGGTA TCATTGCAGC ACTGGGGCCA GATGGTAAGC GCTCCCGTAT 

3181 CGTAGTTATC TACACGACGG GGAGTCAGGC AACTATGGAT GAACGAAATA GACAGATCGC 

3241 TGAGATAGGT GCCTCACTGA TTAAGCATTG GTAACTGTCA GACCAAGTTT ACTCATATAT 

3301 ACTTTAGATT GATTTAAAAG TTCATTTTTA ATTTAAAAGG ATCTAGGTGA AGATCCTTTT 

3361 TGATAAT CTC ATGACCAAAA TCCCTTAACG TGAGTTTTCG TTCCACTGAG CGTCAGACCC 

3421 CGTAGAAAAG ATCAAAGGAT CTTCTTGAGA TCCTTTTTTT CTGCGCGTAA TCTGCTGCTT 

34 81 GCAAACAAAA AAAGCACCGC TACCAGCGGT GGTTTGTTTG CCGGATCAAG AGCTACCAAC 

3 541 TCTTTTTCCG AAGGTAACTG GCTTCAGCAG AGCGCAGATA CCAAATACTG TCCTTCTAGT 

3 601 GTAGCCGTAG TTAGGCCAGC ACTTCAAGAA CTCTGTAGCA CCGCCTACAT ACCTCGCTCT 

3 661 GCTAATCCTG TTACCAGTGG CTGCTGCCAG TGGCGATAAG TCGTGTCTTA CCGGGTTGGA 

3 721 GTCAAGACGA TAGTTACCGG ATAAGGCGCA GCGGTCGGGC TGAACGGGGG GTTCGTGCAC 

3 781 ACAGCCCAGC TTGGAGCGAA CGACCTACAC CGAACTGAGA TACCTACAGC GTGAGCTATG 

3 841 AGAAAGCGCC ACGCTTCCGG AAGGGAGAAA GGCGGACAGG TATCCGGTAA GCGGCAGGGT 

3 901 CGGAACAGGA GAGCGCACGA GGGAGCTTCC AGGGGGAAAC GCCTGGTATC TTTATAGTCC 

3 961 TGTCGGGTTT CGCCACCTCT GACTTGAGCG TCGATTTTTG TGATGCTCGT CAGGGGGGCG 

4 021 GAGCCTATGG AAAAACGCCA GCAACGCGGC CTTTTTACGG TTCCTGGGCT TTTGCTGGCC 
4 081 TTTTGCTCAC ATGTTCTTTC CTGCGTTATC CCCTGATTCT GTGGATAACC GTATTACCGC 
4141 CTTTGAGTGA GCTGATACCG CTCGCCGCAG CCGAACGACC GAGCGCAGCG AGTCAGTGAG 
4 201 CGAGGAAGCG GAAG 



Figure 12 
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pDEST14-IPAAA44548-6HIS 



1 AGATCTCGAT CCCGCGAAAT TAATACGACT CACTATAGGG AGACCACAAC GGTTTCCCTC TAGATCACAA GTTTGTACAA 

81 AAAAG CAGGC TTCGAAGGAG ATATACATAT GACTTCACCA AACGAACTAA ATAAGCTGCC ATGGACCAAT CCTGGAGAAA 

161 CAGAGATATG TGACCTTTCA GAGACAGAAT TCAAAATATC TGTGTTGAAG AACCTCAAGG AAATTCAAGA TAACACAGAG 

241 AAGGAATCCA GAATTCTATC AGACAAATAT AAGAAACAGA TTGAAATAAT TAAAGGGAAT CAAGCAGAAA TTCTGGAGTT 

321 GAGAAATGCA GATGGCACAC TTCACCATCA GCATCACCAT TGAAAGCCAG CTTTCTTGTA CAAAGTGGTG ATGATCCGGC 

401 TGCTAACAAA GCCCGAAAGG AAGCTGAGTT GGCTGCTGCC ACCGCTGAGC AATAACTAGC ATAACCCCTT GGGGCCTCTA 

481 AACGGGTCTT GAGGGGTTTT TTGCTGAAAG GAGGAACTAT ATCCGGATAT CCACAGGACG GGTGTGGTCG CCATGATCGC 

561 ; GTAGTCGATA GTGGCTCCAA GTAGCGAAGC GAGCAGGACT GGGCGGCGGC CAAAGCGGTC GGACAGTGCT CCGAGAACGG 

641 GTGCG CATAG AAATTGCATC AACGCATATA GCGCTAGCAG CACGCCATAG TGACTGGCGA TGCTGTCGGA ATGGACGATA 

721 TCCCGCAAGA GGCCCGGCAG TAC CGGCATA ACCAAGCCTA TGCCTACAGC ATCCAGGGTG ACGGTGCCGA GGATGACGAT 

801 GAGCGCATTG TTAGATTTCA TACACGGTGC CTGACTGCGT TAGCAATTTA ACTGTGATAA ACTACCGCAT^ TAAAGCTTAT 

881 CGATGATAAG CTGTCAAACA TGAGAATTCT TGAAGACGAA AGGGCCTCGT GATACGCCTA TTTTTATAGG TTAATGTCAT 

961 GATAATAATG GTTTCTTAGA CGTCAGGTGG CACTTTT CGG GGAAATGTGC GCGGAACCCC TATTTGTTTA TTTTTCTAAA 

1041 TACATTCAAA TATGTATCCG CTCATGAGAC AATAACCCTG ATAAATGCTT CAATAATATT GAAAAAGGAA GAGTATGAGT 

1121 ATTCAACATT TCCGTGTCGC CCTTATTCCC TTTTTTGGGG CATTTTGCCT TCCTGTTTTT GCTCACCCAG AAACGCTGGT 

1201 GAAAGTAAAA GATGCTGAAG ATCAGTTGGG TGCACGAGTG GGTTACATCG AACTGGATCT CAACAGCGGT AAGATCCTTG 

1281 AGAGTTTTCG CCCCGAAGAA CGTTTTCCAA TGATGAGCAC TTTTAAAGTT CTGCTATGTG GCGCGGTATT ATCCCGTGTT 

1361 GACGCCGGGC AAG AG CAACT CGGTCGCCGC AT AC ACT ATT CTCAGAATGA CTTGGTTGAG TACTCACCAG TCACAGAAAA 

1441 GCATCTTACG GATGGCATGA CAGTAAGAGA ATTATGCAGT GCTGCCATAA CCATGAGTGA TAACACTGCG GCCAACTTAC 

1521 TTCTGACAAC GATCGGAGGA CCGAAGGAGC TAACCGCTTT TTTGCACAAC ATGGGGGATC ATGTAACTCG CCTTGATCGT 

1601 TGGGAACCGG AGCTGAATGA AGCCATACCA AACGACGAGC GTGACACCAC GATGCCTGCA GCAATGGCAA CAACGTTGCG 

1681 GAAACTATTA ACTGGCGAAC TACTTACTCT AGCTTCGCGG CAACAATTAA TAGACTGGAT GGAGGCGGAT AAAGTTGCAG 

1761 GACCACTTCT GCGCTCGGCC GTTCCGGCTG G CTGGTTT AT TGCTGATAAA TCTGGAGCCG GTGAGCGTGG GTCTCGCGGT 

1841 ATCATTGCAG CACTGGGGCC AGATGGTAAG CCCTCCCGTA TCGTAGTTAT CTACACGACG GGGAGTCAGG CAACTATGGA 
1921. TGAACGAAAT AGACAGATCG CTGAGATAGG TGCCTCACTG ATTAAGCATT GGTAACTGTC AGACCAAGTT TACTCATATA 

2001 TACTTTAGAT TGATTTAAAA CTTCATTTTT AATTTAAAAG GAT CTAGGTG AAGATCCTTT TTGATAATGT CATGACCAAA 

2081 ATCCCTTAAC GTGAGTTTTC GTTCCACTGA GCGTCAGACC CCGTAGAAAA GATCAAAGGA TCTTCTTGAG ATCCTTTTTT 

2161 TCTGCGCGTA ATCTGCTGCT TGCAAACAAA AAAACCACCG CTACCAGCGG TGGTTTGTTT GCCGGATCAA GAGCTACCAA 

2241 CTCTTTTTCC GAAGGTAACT GGCTTCAGCA GAGCG CAGAT ACCAAATACT GTCCTTCTAG TGTAGCCGTA GTTAGGCCAC 

2321 CACTTCAAGA ACTCTGTAGC ACCGCCTACA TACCTCGCTC TGCTAATCCT GTTACCAGTG GCTGCTGCCA GTGGCG ATAA 
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24 01 GTCGTGTCTT ACCGGGTTGG ACTCAAGACG ATAGTTACCG GATAAGGCGC AGCGGTCGGG GTGAACGGGG GGTTCGTGCA 

2481 CACAGCCCAG CTTGGAGCGA ACGACCTACA CCGAACTGAG ATACCTACAG CGTGAGCTAT GAG AAAG CGC CACGCTTCCC 

2561 GAAGGGAGAA AGGCGGACAG GTATCCGGTA AGCGGCAGGG TCGGAACAGG AGAGCGCACG AGGGAGCTTC CAGGGGGAAA 

2641 CGCCTGGTAT CTTTATAGTC CTGTCGGGTT TCGCCACCTC TGACTTGAGC GTCGATTTTT GTGATGCTCG TCAGGGGGGC 

2721 GGAGCCTATG GAAAAACGCC AGCAACGCGG CCTTTTTACG GTTCCTGGCC TTTTGCTGGC CTTTTGCTCA CATGTTCTTT 

2801 CCTGCGTTAT CCCCTGATTC TGTGGATAAC CGTATTACCG CCTTTGAGTG AGCTGATACC GCTCGCCGCA GCCGAACGAC 

2881 CGAGCGCAGC GAGTCAGTGA GCGAGGAAGC GGAAGAGCGC CTGATGCGGT ATTTTCTCCT TACGCATCTG TGCGGTATTT 

2961 CACACCGCAT ATATGGTGCA CTCTCAGTAC AATCTGCTCT GATGCCGCAT AGTTAAGCCA GTATACACTC CGCTATCGCT 

3041 ACGTGAGTGG GTCATGGCTG CGCCCCGACA CCCGCCAACA CCCGCTGACG CGCCCTGACG GGCTTGTCTG CTCCCGGCAT 

3121 CCGCTTACAG ACAAGCTGTG ACCGTCTCCG GGAGCTGCAT GTGTGAGAGG TTTTCACCGT CATCACCGAA ACGCGCGAGG 

3201 CAGCTGCGGT AAAGCTCATC AG CGTGGTGG TGAAGCGATT CACAGATGTC TGCCTGTTCA TCCGCGTCCA GCTCGTTGAG 

3281 TTTCTCCAGA AG CGTTAATG TCTGGCTTCT GATAAAGCGG GCCATGTTAA GGGCGGTTTT TTCCTGTTTG GTCACTGATG 

3361 CCTCCGTGTA AGGGGGATTT CTGTTCATGG GGGTAATGAT ACCGATGAAA CGAGAGAGGA TGCTCACGAT ACGGGTTACT 

344.1' GATGATGAAC ATGCCCGGTT ACTGGAACGT TGTGAGGGTA AACAACTGGC GGTATGGATG CGGCGGGACC AGAGAAAAAT 

3521 CACTCAGGGT CAATGCCAGC GCTTCGTTAA TACAGATGTA GGTGTTCCAC AGGGTAGCCA GGAGCATCCT GCGATGCAGA 

3601 TCCGGAACAT AATGGTGCAG GGCGCTGACT TCCGCGTTTC CAGACTTTAC GAAACACGGA AACCGAAGAC CATTCATGTT 

3681 GTTGCTCAGG TCGCAGACGT TTTGCAGCAG CAGTCGCTTC ACGTTCGCTC GCGTATCGGT GATTCATTCT GCTAACCAGT 

37 61 AAGGCAACCC CGCCAGCCTA GCCGGGTCCT CAACGACAGG AGCACGATCA TGCGCACCCG TGGC CAGGAC CCAACGCTGC 

3841 CCGAGATGCG CCG CGTGCGG ' CTGCTGGAGA TGGCGGACGC GATGGAT ATG TTCTGCCAAG GGTTGGTTTG CGCATTCACA 

3921 GTTCTCCGCA AGAATTGATT GGCTCCAATT CTTGGAGTGG TGAATCCGTT AGCGAGGTGC CGCCGGCTTC CATTCAGGTC 

4001 GAGGTGGCCC GGCTCCATGC ACCGCGACGC AACGCGGGGA GGCAGACAAG GTATAGGGCG GCGCCTACAA TCCATGCCAA 

4081 CCCGTTCCAT GTGCTCGCCG AGGCGGCATA AATCGCCGTG ACGATCAGCG GTCCAGTGAT CGAAGTTAGG CTGGTAAGAG 

4161 CCGCGAGCGA TCCTTG AAG C TGTCCCTGAT GGTCGTCATC TAGCTGCCTG GACAGCATGG CCTGCAACGC GGGCATCCCG 

4241 ATGCCGCCGG AAGCGAGAAG AATCATAATG GGGAAGGCCA TCCAGCCTCG GGTCGCGAAC GCCAGCAAGA CGTAGCCCAG 

4321 CGCGTCGGCC GCCATGCCGG CGATAATGGC CTGCTTCTCG CCGAAACGTT TGGTGG CGGG ACCAGTGACG AAGGCTTGAG 

44 01 CGAGGGCGTG CAAGATTCCG AATACCGCAA GCGACAGGCC GATCATCGTC GCGCTCCAGC GAAAGCGGTC CTCGCCGAAA 

44 81 ATGACCCAGA GCGCTGCCGG CACCTGTCCT ACGAGTTGCA TGATAAAGAA GACAGTCATA AGTGCGGCGA CGATAGTCAT 

4561 GCCCCGCGCC CACCGGAAGG AGCTGACTGG GTTGAAGGCT CTCAAGGGCA TCGGTCGATC GACGCTCTCC CTTATG CGAC 

4641 TCCTGCATTA GGAAGCAGCC CAGTAGTAGG TTGAGGCCGT TGAGCACCGC CGCCGCAAGG AATGGTGCAT GCAAGGAGAT 

4721 GGCGCCCAAC AGTCCCCCGG CCACGGGGCC TGCCACCATA CCCACGCCGA AACAAGCGCT CATGAGCCCG AAGTGGCGAG 

4801 CCCGATCTTC CCCATCGGTG ATGTCGGCGA TATAGGCGCC AGCAACCGCA CCTGTGGCGC CGGTGATGCC GGCCACGATG 

4881 CGTCCGGCGT AGAGGATCG 
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Figure 13 pEAK12D-IPAAA44548-6HIS 



1 GGCGTAATCT GCTGCTTGCA AACAAAAAAA CCACCGCTAC CAGCGGTGGT TTGTTTGCCG GATCAAGAGC TACCAACTCT 

81 TTTTCCGAAG GTAACTGGCT TCAGCAGAGC GCAGATACCA AATACTGTCC TTCTAGTGTA GCCGTAGTTA GGCCACCACT 

161 TCAAGAACTC TGTAGCACCG CCTACATACC TCGGTCTGCT GAAGCCAGTT ACCAGTGGCT GCTGCCAGTG GCGATAAGTC 

241 GTGTCTTACC GGGTTGGACT CAAGAGATAG TTACCGGATA AGGCGCAGCG GTCGGGCTGA ACGGGGGGTT CGTGCACACA 

321 GCCCAGCTTG GAGCGAACGA CCTACACCGA ACTGAGATAC CTACAGCGTG AGCTATGAGA AAGCGCCACG CTTCCCGAAG 

401 GGAGAAAGGC GGACAGGTAT CCGGTAAGCG GCAGGGTCGG AACAGGAGAG CGCACGAGGG AG CTTCCAGG GGGAAACGCC 

481 TGGTATCTTT ATAGTCCTGT CGGGTTTCGC CACCTCTGAC TTGAGCGTCG ATTTTTGTGA TGCTCGTCAG GGGGGGGGAG 

561 CCTATGGAAA AACGCCAGCA ACGCAAGCTA GAGTTTAAAC TTG ACAGATG AGACAATAAC CCTGATAAAT GCTTCAATAA 

641 TATTGAAAAA GGAAAAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT TCGCTTTTTT GCGGCATTTT GCCTTCCTGT 

721 TTTTGCTCAC CCAGAAACGC TGGTGAAAGT AAAAGATGCA GAAGATCACT TGGGTGGGCG AGTGGGTTAC ATCGAACTGG 

8 01 ATCTCAACAG CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTC CCAATGATGA GCACTTTTAA AGTTCTGCTA 

881 TGTGGCGCGG TATTATCCCG TATTGATGCC GGGCAAGAGC AACTCGGTCG CCGCATACAC TATTCTCAGA ATGACTTGGT 

961 TGAATACTCA CCAGTCACAG AAAAGCATCT TACGGATGGC ATGACAGTAA GAGAATTATG CAGTG CTGCC ATAACCATGA 

1041 GTGATAACAC TGCGGCCAAC TTACTTCTGA CAACTATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA CAACATGGGG 

1121 GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAG CCAT ACCAAACGAC GAGCGTGACA CCACGATGCC 

1201 TGTAGCAATG GCAACAACGT TGCGAAAACT ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA CTAATAGACT 

1281 GGATGGAGGC GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCACTTCCG GCTGGCTGGT TTATTGCTGA TAAATCAGGA 

1361 GCCGGTGAGC GTGGGTCACG CGGTATCATT GCAGCACTGG GGCCGGATGG TAAGCCCTCC CGTATCGTAG TTATCTACAC 

1441 TACGGGGAGT CAGGCAACTA TGGATGAACG AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAG 

1521 GATAAATTTC TGGTAAGGAG GACACGTATG GAAGTGGGCA AGTTGGGGAA GCCGTATCCG TTGCTGAATC TGGCATATGT 

1601 GGGAGTATAA GACGCGCAGC GTCGCATCAG GCATTTTTTT CTGCGCCAAT GCAAAAAGGC CATCCGTCAG GATGG CCTTT 

1681 CGGGATAACT AGTGAGGCTC CGGTGCCCGT CAGTGGGCAG AGCGCACATC GCCCACAGTC CCCGAGAAGT TGGGGGGAGG 

1761 GGTCGGCAAT TGAACCGGTG CCTAGAGAAG GTGGCGCGGG GTAAACTGGG AAAGTGATGT CGTGTACTGG CTCCGCCTTT 

1841 TTCCCGAGGG TGGGGGAGAA CCGTATATAA GTGCAGTAGT CGCCGTGAAC GTTCTTTTTC GCAACGGGTT TGCCGCCAGA 

1921 ACACAGGTAA GTGCCGTGTG TGGTTCCCGC GGGCCTGGCC TCTTTACGGG TTATGG CCCT TGCGTGCCTT GAATTACTTC 

2001 CACCTGGCTG CAGTACGTGA TTCTTGATCC CGAGCTTCGG GTTGGAAGTG GGTGGGAGAG TTCGAGGCCT TGCGCTTAAG 

2081 GAGCCCCTTC GCCTCGTGGT TGAGTTGAGG CCTGGCCTGG GCGCTGGGGC CGCCGCGTGC GAATCTGGTG GCACCTTCGC 

2161 GCCTGTCTCG CTGCTTTCGA TAAGTCTCTA GCCATTTAAA ATTTTTGATG ACCTGCTGCG ACGCTTTTTT TCTGGCAAGA 
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2241 TAGTCTTGTA AATGCGGGCC 

2321 CGTCCCAGCG CACATG CATG 

2401 TGGCCGGCCT GCTGTGGTGC 

2481 GAGGACGCGG CGCTCGGGAG 

2561 CATGTGACTC CACGGAGTAC 

2641 TGGGGGGAGG GGTTTTATGC 

2721 GTAATTCTCC TTGGAATTTG 

280.1 CGACTCACTA TAGGGAGACT 

2881 CGCCACCATG ACTTCACCAA 

2961 ACACAGAATT CAAAATATCT 

3041 GACAAATATA AGAAACAGAT 

3121 TCACCATCAC' CATCACCATT 

32 01 GCCCTCCAGC TCAAGG CGGG 

3281 ACCTCCATCT CTTCCTCAGG 

3361 TACTCCAGTG CCCACCAGCC 

3441 TGGGGTGGAG GCGGGTGGTA 

3521 ATAGCATCAC AAATTTCACA 

3601 TATCATGTCT GGATCCGCTT 

3681 GCGGTGACGG TGAAG CCG AG 

37 61 GCGCTCGGCC GCCTCCACTC 

3841 TGGCCAGGAA CCACGCGGGC 

3921 CCGCTCAACT CGGCCATGCG 

4001 CACCGCGGCG CCGTCGTCCG 

4081 TGACCCGCTC GATGTGGCGG 

4161 ACGGCCCGGG GGACGTCGTC 

4241 GTGTTCGAGG CCACACGCGT 

4321 GCCAATGACA AGACGCTGGG 

4401 CAGCAAACGC GAGCAACGGG 

44 81 TAGCATATGC TTGCCGGGTA 

4561 TATGCTATCG AATTAGGGTT 

4641 GGTAGCATAT GCTATCCTAA 

4721 TCTATATCTG GGTAGTATAT 



AAGACGATCT GCACACTGGT ATTTCGGTTT 
TTCGGCGAGG CGGGGCCTGC GAGCGCGGCC 
CTGGCCTCGC GCCGCCGTGT ATCGCCCCGC 
AGCGGGCGGG TGAGTCACCC ACACAAAGGA 
CGGGCGCCGT CCAGGCACCT CGATTAGTTC 
GATGGAGTTT CCCCACACTG AGTGGGTGGA 
CCCTTTTTGA GTTTGGATCT TGGTTCATTG 
TCTTTCTCCC ATTTCAGGTG TCGTAAGCTA 
ACGAACTAAA TAAGCTGCCA TGGACCAATC 
GTGTTGAAGA ACCTCAAGGA AATTCAAGAT 
TGAAATAATT AAAGGGAATC AAGCAGAAAT 
GAAACCCAGC TTTCTTGT AC AAAGTGGTTC 
ACAGGTGCCC TAGAGTAGCC TGCATCCAGG 
TCTGCCCGGG TGGCATCCCT GTGACCCCTC 
TTGT CGTAAT AAAATTAAGT TGCATCATTT 
TGGAGCAAGG GGCCCAAGTT AACTTGTTTA 
AATAAAGCAT TTTTTTCACT GCATT CTAGT 
CAGGCACCGG GCTTGCGGGT CATG CACCAG 
CCGCTCGTAG AAGGGGAGGT TGCGGGGCGC 
CGGGGAGCAC GACGGCGCTG CCCAGACCCT 
TCCTTGGGCC GGTGCGGCGC CAGGAGGCCT 
CGGGCCGATG TCGGCGAACA CCGCCCCCGC 
CGACCCACAC CTTG CCGATG TCGAGCCCGA 
TCCGGGTCGA CGGTGTGGCG CGTGGCGGGG 
GGGGGTGGCG AGGCGCACCG TGGGCTTGTA 
CACCTTAATA TGCGAAGTGG ACCTGGGACC 
CGGGGTTTGT GTCATCATAG AACTAAAGAC 
CCACGGGGAT GAAGCAGCTG CGCGACTCCC 
GTAGTATATA CTATCCAGAC TAACCCTAAT 
AGTAAAAGGG TCCTAAGGAA CAGCGATCTG 
TCTATATCTG GGTAGCATAG GCTATCCTAA 
GCTATCCTAA TTTATATCTG GGTAGCATAG 



TTGGGGCCGC GGGCGGCGAC GGGGCCCGTG 
ACCGAGAATC GGACGGGGGT AGTCTCAAGC 
CCTGGGCGGC AAGGCTGGGA GCTCAAAATG 
AAAGGGCCTT TCCGTCCTCA GCCGTCGCTT 
TCGAGCTTTT GGAGTACGTC GTCTTTAGGT 
GACTGAAGTT AGGCCAGCTT GGCACTTGAT 
TCAAGCCTCA GACAGTGGTT CAAATTAATA 
TCAAACAAGT TTGTACAAAA AAGCAGGCTT 
CTGGAGAAAC AGAGATATGT GACCTTTCAG 
AACACAGAGA AGGAATCCAG AATTCTATCA 
TCTGGAGTTG AG AAATG CAG ATGGGACACT 
GATGGCCGCA GGTAAGCCAG CCCAGGCCTC 
GACAGGCCCC AGCCGGGTGC TGACACGTCC 
CCCAGTGCCT CTCCTGGTCG TGGAAGGTGC 
TGTTTGACTA GGTGTCCTTG TATAATATTA 
TTG CAGCTTA TAATGGTTAC AAATAAAGCA 
TGTGGTTTGT CCAAACTCAT CAATGTATCT 
GTGCGCGGTC CTTCGGGCAC CTCGACGTCG 
GGAGGTCTCC AGGAAGGCGG GCACCCCGGC 
TGCCCTGGTG GTCGGGCGAG ACGCCGACGG 
TCCATCTGTT GCTGCGCGGC CAGCCTGGAA 
TTCGACGCTC TCCGGCGTGG TCCAGACCGC 
CGCG CGTGAG GAAGAGTTCT TGCAGCTCGG 
TAGTCGGCGA ACGCGGCGGC GAGGGTGCGT 
CTCGGTCATG GTGGCCTGCA GAGTCGCTCT 
GCGCCGCCCC GACTGCATCT GCGTGTTTTC 
ATGCAAATAT ATTTCTTCCG GGGACACCGC 
TGAAGATCCC CCTTATTAAC CCTAAACGGG 
TCAATAGCAT ATGTTACCCA ACGGGAAGCA 
GATAGCATAT GCTATCCTAA TCTATATCTG 
TCTATATCTG . GGTAGCATAT GCTATCCTAA 
GCTATCCTAA TCTATATCTG GGTAGCATAT 
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48 01 GCTATCCTAA TCTATATCTG GGTAGTATAT GCTATCCTAA TCTGTATCCG GGTAGCATAT GCTATCCTCA TGCATATACA 

4881 GTCAGCATAT GATACCCAGT AGTAGAGTGG GAGTG CTATC CTTTG CATAT GCCGCCACCT CCCAAGGAGA TCCGCATGTC 

4 961 TGATTGCTCA CCAGGTAAAT GTCGCTAATG TTTTCCAACG CGAGAAGGTG TTGAGCGCGG AGCTGAGTGA CGTGACAACA 

5041 TGGGTATGCC CAATTGCCCC ATGTTGGGAG GACGAAAATG GTGACAAGAC AGATGGCCAG AAATACACCA ACAGCACGCA 

5121 TGATGTCTAC TGGGGATTTA TTCTTTAGTG CGGGGGAATA CACGGCTTTT AATACGATTG AGGG CGTCTC CTAACAAGTT 

5201 ACATCACTCC TGCCCTTCCT CACCCTCATC TCCATCACCT CCTTCATCTC CGTCATCTCC GTCATCACCC TCCGCGGCAG 

5281 CCCCTTCCAC CATAGGTGGA AACCAGGGAG GCAAATCTAC TCCATCGTCA AAG CTGCACA CAGTCACCCT GATATTGCAG 

5361 GTAGGAGCGG GCTTTGTCAT AACAAGGTCC TTAATCGCAT CCTTCAAAAC CTGAGCAAAT ATATGAGTTT GTAAAAAGAC 

5441 CATGAAATAA CAGACAATGG ACTCCCTTAG CGGG CCAGGT TGTGGGCCGG GTCCAGGGGC CATTCCAAAG GGGAGACGAC 

5521 TCAATGGTGT AAGACGACAT TGTGGAATAG CAAGGGCAGT TCCTCGCCTT AGGTTGTAAA GGGAGGTCTT ACTACCTCCA 

5601 TATACGAACA CACCGGCGAC CCAAGTTCCT TCGTCGGTAG TCCTTTCTAC GTGACTCCTA GCCAGGAGAG CTCTTAAACC 

5681 TTCTG CAATG TTCTCAAATT TCGGGTTGGA ACCTCCTTGA CCACGATGCT TTC CAAACCA CCCTCCTTTT TTGCGCCTGG 

5761 CTCCATCACC CTGACCCCGG GGTCCAGTGC TTGGGCCTTC TCCTGGGTCA TCTGCGGGGC CCTG CTCTAT CGCTCCCGGG 

5841 GGCACGTCAG GCTCACCATC TGGGCCACCT TCTTGGTGGT ATTCAAAATA ATCGGCTTCC CCTACAGGGT GGAAAAATGG 

5921 CCTTCTACCT GGAGGGGGCC TGCGCGGTGG AGACCCGGAT GATGATGACT GACTACTGGG ACTCCTGGGC CTGTTTTCTC 

6001 CACGTCCACG ACCTCTCCCC CTGG CTCTTT CACGACTTCC CCCCCTGGCT CTTTCACGTC CTCTACCCCG GCGGCCTCCA 

6081 CTACCTCCTC GACCCCGGCC TOCACTACCT CCTCGACCCC GGCCTCCACT GCCTCCTCGA CCCCGGCCTC CGGCACCTCC 

6161 TCCAGCCCCA GCACCTCCAC CAGCCCCAGC TCCCCCAGCT CCAGCCCCAC CAGCACCAGG CCCTCCAGCC CCACCAGCCC 
6241 CAGCCCCTCC GGCACCTCCT CCAGCCCCAG CACCTCCACC AGCCCCAGCT CCCCCAGCTC CAGCCCCACC AGCACCAGCC 

6321 CCTCCAGCCC CACCAGCCCC AGCCCCTCCT GTTCCACCGT GGGTCCCTTT GCAGCCAATG CAACTTGGAC GTTTTTGGGG 

6401 TCTCCGGACA CCATCTCTAT GTCTTGG CCC TGATCCTGAG CCGCCCGGGG CTCCTGGTCT TCCGCCTCCT CGTCCTCGTC 

6481 CTCTTCCCCG TCCTCGTCCA TGGTTATCAC CCCCTCTTCT TTGAGGTCCA CTGCCGCCGG AGCCTTCTGG TCCAGATGTG 

6561 TCTCCCTTCT CTCCTAGGCC ATTTCCAGGT CCTGTACCTG GCCCCTCGTC AGACATGATT CACACTAAAA GAGATCAATA 

6641 GACATCTTTA TTAGACGACG CTCAGTGAAT ACAGGGAGTG CAG ACT CCTG CCCCCTCCAA CAGCCCCCCC ACCCTCATCC 

6721 CCTTCATGGT CGCTGTCAGA CAGATCCAGG TCTGAAAATT CCCCATCCTC CGAACCATCC TCGTCCTCAT CACCAATTAC 

6801 TCGCAGCCCG GAAAACTCCC GCTGAACATC CTCAAGATTT GCGTCCTGAG CCTCAAGCCA GGCCTCAAAT TCCTCGTCCC 

6881 CCTTTTTGCT GGACGGTAGG GATGGGGATT CTCGGGACCC CTCCTCTTCC TCTTCAAGGT CACCAGACAG AGATGCTACT 

6961 GGGGCAACGG AAGAAAAGCT GGGTGCGGCC TGTGAAGGTA AGATCTGTCG ACATCGATGG GCGCGGGTGT ACACTCCGCC 

7041 CATCCCGCCC CTAACTCCGC CCAGTTCCGC CCATTCTCCG' CCTCATGGCT GACTAATTTT TTTTATTTAT GCAGAGGCCG 

7121 AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGG CTTTTTT GGAGGCCTAG GCTTTTGCAA AAAGCTAATT 

7201 C 



Figure 14 
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BLASTP v NCBI nr 

Query= INSP037.pep 

(78 letters) 

Database: All non-redundant GenBank CDS 
translations+PDB+SwissProt+PIR+PRF 

1,446,218 sequences; 465,230,387 total letters 

Searching . . . . . . done 

Score E 

Sequences producing significant alignments: ^ (bits) Value 

ref | XP 211857 . 1 1 hypothetical protein XP_211857 [Homo sapiens] 
ref |XP_112161 .2 | similar to putative RNA binding protein 1 / [Ratt 
ref |XP_220945 . 1 | similar to keratin 21, type I, cytoskeletal - r 
ref |NP_775151 .1 | cytokeratin 21 [Rattus norvegicus] >gi| 125089 |s 
gb| AAD49229.2 |AF159462_1 EHEC factor for adherence [Escherichia 
gb| AAL57562 . 1 |AF453441_46 Efal [Escherichia coli] 
emb | CAB55629 . 1 | lymphostatin [Escherichia coli] 
emb| CAC81883 . 1 | Ef al-Lif A-Tox protein [Escherichia, coli] 
gb | AAA393 99.1] ORF1 

pir| |T36223 hypothetical protein SCE39.13c - Streptomyces coelic 
>ref | XP_211857 . 1 1 hypothetical protein XP_211857 [Homo sapiens] 
Length =113 

Score 109 bits (273), Expect = 8e-24 
Identities = 54/74 (72%), Positives = 63/74 (84%) 

Query: 1 OTSPNELNKLPWTNPGETEICDLSDTEFKISVXK^ 60 

MTSPNELN+ P TNP ETEIC++ D EFKI+VL+ h EIQDNTEKE ++LSDK K+IEI 
Sbjct: 1 MT S PNE LNEAPGTN PAETE I CN I LDRE FK I AVLRKLNE I QDNTE KELKVLSDK 1 1 KE I E I 60 

Query: 61 I KGNQAE I LELRNA 74 

IK NQAEILEL+NA 
Sbjct: 61 I KMNQAEI LELKNA 74 



109 8e-24 

38 0.041 

3'7 0.069 

37 0.069 

35 0.26 

35 0.26 

35 0.26 

35 0.26 

35 0.34 

34 0.59 
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Figure 15 

BLAST v month-aa 

Query= INSP037.pep 

(78 letters) 

Database: NCBI: Rolling month (30 days) of new/revised protein 
sequences 

37,755 sequences; 14,558,446 total letters 
Searching. done 



Score E 

Sequences producing significant alignments: (bits) Value 



ref |XP_141262 


1 


similar to NAG 14 protein [Homo sapiens] [Mus mu. . . 


30 


0 


27 


ref |NP_831679 


1 


Phage -related protein [Bacteriophage phBC6A51] . . . 


30 


0 


36 


ref |NP_083191 


1 


RIKEN cDNA 1200008A14 [Mus musculus] >gi | 128359 .. . 


29 


0 


61 


ref |NP_852012 


1 


neck appendage [Streptococcus phage CI] >gi|309... 


28 


0 


80 


ref 648 


1 


neurexin I; neurexin I beta; neurexin I alpha; ... 


28 


1 


0 


ref |XP_3 19358 


1 


ENSANGP00000006161 [Anopheles gambiae] >gi|2130... 


28 


1 


0 


ref |XP_308412 


1 


ENSANGPO 00 000 19827 [Anopheles gambiae] >gi|2129... 


28 


1 


0 


ref |NP_196806 


2 


expressed protein [Arabidopsis thaliana] 


27 


1 


8 


gb|AAL29689.1 


Snf2-related chromatin remodeling factor SRCAP [T... 


27 


1 


8 


ref |XP__314825 


1 


ENSANGP00000011098 [Anopheles gambiae] >gi | 2129 . 


27 


1 


8 


ref |XP_311503 


1 


ENSANGPO00 000136 57 [Anopheles gambiae] >gi|2129... 


27 


2 


3 



>ref |XP_141262 . 1 | similar to NAG 14 protein [Homo sapiens] [Mus musculus] 
ref |XP_2 30311 . 1 | similar to NAG 14 protein [Homo sapiens] [Rattus norvegicus] 
ref |NP_848840.1| RIKEN cDNA 6430556C10 gene [Mus musculus] 
dbj | BAC28656 . 1 | unnamed protein product [Mus musculus] 
dbj |BAC33302 .1 | unnamed protein product [Mus musculus] 
Length = 640 

Score =30.0 bits (66), Expect = 0.27 

Identities = 22/59 (37%) , Positives = 33/59 (55%), Gaps » 8/59 (13%) 

Query: 20 ICDLSDTEFK- ISVLKNLKEIQDNTEKESRILSDKYKKQIEI IKGN- QAEILEL 71 

+C S+ K I V KML+E+ D +R+L + ++ QI+IIK N EIL+L 

Sbjct: 50 VCSCSNQFSKVICVRKNLREVPDGISTNTRLL-NLHENQIQIIKWSFKHLRHLEILQL 107 
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Figure 16A 



TBLASTN v NCBI nt-month 

Query= INSP037 .pep 

(78 letters) 



days) of new/ revised nt sequences 
. .but no EST, STS, GSS, ' - 1 — n 

or 2 HTGS sequences ) ) 

44,426 sequences; 216,324,491 total letters 



Database: NCBI: Rolling month (30 

(GenBank+ EMBL + DDBJ sequences (but no EST, STS, GSS, or phase 0, 1 



.done 







Score 


E 


(bits) 


Value 


105 


2e-23 


89 


2e-18 


82 


4e-16 


80 


le-15 


66 


3e-ll 


62 


5e-10 


54 


le-07 


54 


le-07 


47 


le-05 


45 


7e-05 



Searching . 

Sequences producing significant alignments: 

gb| AC093724 . 3 | Homo sapiens BAC clone RP11-1L5 from 2, complete . 
emb| BX510371 . 4 | Human DNA sequence from clone RP13-728A10 on chr. 
gb| AC144561 .8 | Homo sapiens 3 BAC RP11-628C23 (Roswell Park Cane, 
dbj | AP001827 . 5 | Homo sapiens genomic DNA, chromosome 11 clone:RP. 
emb|Z97632 .3 (HS196E23 Human DNA sequence from clone RP1^196E23 o. 
emb j BX322234 . 7 | Human DNA sequence from clone XXyac-65C7_A on ch. 
dbj I AP005138 . 3 I Homo sapiens genomic DNA, chromosome 18 clonerRP. 
dbj j AP0062 92 .2 j Homo sapiens genomic DNA, chromosome 9 clone :RP1. 
gbj AC083903 . 10 j Homo sapiens chromosome UNK clone RP11-785G23, c. 
gbj AY293855- 1 ) Homo sapiens insulin -like growth factor 2 recepto- 

>gb| AC093724 .3 | Homo sapiens BAC clone RP11-1L5 from 2, complete sequence 
Length = 161617 

Score = 105 bits (263) , Expect =' 2e-23 
Identities = 55/78 (70%), Positives = 62/78- (78%) 
Frame = -3 

Query: 1 MTSPNELNKLPWTNPGETE I CDLSDTEFKI SVLKNLKE I QDNTEKESRI LSDKYKKQIE I 60 

MTSPNELNK P NP ET++CDLS EFKI+VL+ LKEIQDNTEK RILSDK+ K I EI 
Sbjct: 22538 MTSPNELNKAPR I NPQETKLCDLSHGEFK I AVLRKLKE I QDNTEKGFR I LSDKFNKDI E I 22359 

Query: 61 I KGNQAE I LELRNADGTL 7 8 

I +AEILEL+NA G L 
Sbjct: 22358 IFKTRAEILELKNAIGIL 22305 

Score = 30.0 bits (66), Expect = 1.7 

Identities = 19/60 (31%), Positives = 35/60 (57%) 

Frame = +3 

Query: 14 NPGETE I CDLSDTEFKI SVLKNLKE I QDNTEKESRI LSDKYKKQ I E I I KGNQAE I LELRN 73 

+P + EI DLS+ EFK+ V+K ++B + E + + K +K 1+ +KG + ++ N 
Sbjct: 111237 DPNKEEITDLSEKEFKL- VI KLI REGPEKGEAQCK KIQKVIQ*VKGETFKEIDSLN 111401 



Figure 16B 
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TBLASTN v NCBI nt 



Query= INSP037.pep 

(78 letters) 



Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, 
or phase 0, 1 or 2 HTGS sequences) 

1,794,754 sequences; 8,367,844,792 total letters 



Searching 



done 



Score 



E 



Sequences producing significant alignments: 



(bits) 



Value 



gb|AC112641 .3 | Homo sapiens 3 BAC RP11-431I8 (Rpswell Park Cance... 


158 


2e. 


37 


gb| AC026118 .17 | Homo sapiens 3 BAC RP11-67F24 (Roswell Park Cane... 


158 


2e- 


37 


emb|AL020989.2 | HS192P9 Human DNA sequence from clone RPlrl92P9 o. . . 


117 


3e- 


25 


gb| AC009811 . 14 | Homo sapiens chromosome 3, clone RP11-491K7, com... 


116 


7e- 


25 


gb| AC108166 . 5 | Homo sapiens BAC clone RP11-724L20: from 4, comple... 


115 


9e- 


25 


gb|AC011299.3|AC011299 Homo sapiens BAC clone RP11-232C20 from 7... 


115 


le- 


24 


gb| AC144613 . 1 | Pan troglodytes chromosome 7 clone RP43-1F6, comp... 


115 


le- 


24 


dbj |AP001992.4 | Homo sapiens genomic DNA, chromosome llq clone:R. . . 


115 


le- 


24 


emb| AL3593 93 . 9 | Human DNA sequence from clone RP11-338I3 on chro. . . 


114 


2e- 


24 


emb| AL353577 . 22 | Human DNA sequence from clone RP11-661K19 on ch... 


114 


2e- 


24 



>gb| AC112641 . 3 | Homo sapiens 3 BAC RP11-431I8 (Roswell . Park Cancer Institute Human BAC 
Library) complete sequence 
Length = 165619 

Score = 158 bits (399) , Expect = 2e-37 

Identities = 78/78 (100%), Positives = 78/78 (100%) 

Frame = +3 

Query: 1 MTSPNELNKLPWTNPGETEICDLSDTEFKISVLKNLKEIQDNTEKESRILSDKYKKQIEI 60 

MTS PNELNKLP WTNPGETEI CDLSDTEFKI SVLKNLKEIQDNTEKESRILSDKYKKQIEI 
Sbjct: 47052 MTS PNELNKLP WTNPGETE I CDLSDTEFKI SVLKNLKEIQDNTEKESRILSDKYKKQIEI 47231 

Query: 61 I KGNQAE I LELRNADGTL 78 

I KGNQAE I LELRNADGTL 
Sbjct: 47232 I KGNQAE I LELRNADGTL 4 7285 
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Figure 17 
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Figure 18 
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